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Abstract 

Consider the semiparametric transformation model Ke^iY) = m[X) + e, where 60 is an unknown 
finite dimensional parameter, the functions Ae^ and m are smooth, e is independent of X, and E(£) = 0. 
We propose a kernel-type estimator of the density of the error e, and prove its asymptotic normality. The 
estimated errors, which lie at the basis of this estimator, are obtained from a profile likelihood estimator 
of 9o and a nonparametric kernel estimator of m. The practical performance of the proposed density 
estimator is evaluated in a simulation study. 
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1 Introduction 



Let (Xi,Yi), . . . , {Xn, Yn) be independent replicates of the random vector {X, Y), where F is a univariate 
dependent variable and X is a one-dimensional covariate. We assume that Y and X are related via the 
semiparametric transformation model 

Ag^(Y)^m{X)+e, (1.1) 

where e is independent of X and has mean zero. We assume that {Ag : 9 G Q} (with C M*^ compact) is 
a parametric family of strictly increasing functions defined on an unbounded subset V in R, while m is the 
unknown regression function, belonging to an infinite dimensional parameter set A4. We assume that Ai is 
a space of functions endowed with the norm || • \\m = \\ ■ ||oc:. We denote 9o G Q and m E A4 for the true 
unknown finite and infinite dimensional parameters. Define the regression function 

mgix)=E[Ag(Y)\X = x], 

for each 6* G 9, and let eg ~ e{9) = AgiY) — mg{X). 

In this paper, we are interested in the estimation of the probability density function (p.d.f.) of the 
residual term e = Ag^iY) — m{X). To this end, we first obtain the estimators 9 and fhg of the parameter 
9o and the function mg, and second, form the semiparametric regression residuals s'i{9) = A^iYi) — mg{Xi). 
To estimate 9o we use a profile likelihood (PL) approach, developed in Linton, Sperlich and Van Keilegom 
(2008), whereas rhg is estimated by means of a Nadaraya- Watson- type estimator (Nadaraya, 1964, Watson, 
1964). To our knowledge, the estimation of the density of e in model (|1.1[) has not yet been investigated in 
the statistical literature. This estimation may be very useful in various regression problems. Indeed, taking 
transformations of the data may induce normality and error variance homogeneity in the transformed model. 
So the estimation of the error density in the transformed model may be used for testing these hypotheses. 

Taking transformations of the data has been an important part of statistical practice for many years. 
A major contribution to this methodology was made by Box and Cox (1964), who proposed a parametric 
power family of transformations that includes the logarithm and the identity. They suggested that the power 
transformation, when applied to the dependent variable in a linear regression model, might induce normality 
and homoscedasticity. Lots of effort has been devoted to the investigation of the Box-Cox transformation 
since its introduction. See, for example, Amemiya (1985), Horowitz (1998), Chen, Lockhart and Stephens 
(2002), Shin (2008), and Fitzenberger, Wilke and Zhang (2010). Other dependent variable transformations 
have been suggested, for example, the Zellner and Revankar (1969) transform and the Bickel and Doksum 
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(1981) transform. The transformation methodology has been quite successful and a large literature exists on 
this topic for parametric models. See Carroll and Ruppert (1988) and Sakia (1992) and references therein. 

The estimation of (functionals of) the error distribution and density under simplified versions of model 
(jl.ip has received considerable attention in the statistical literature in recent years. Consider e.g. model 
(|1.1[) but with Kg^ = id, i.e. the response is not transformed. Under this model, Escanciano and Jacho- 
Chavez (2010) considered the estimation of the (marginal) density of the response Y via the estimation 
of the error density. Akritas and Van Keilegom (2001) estimated the cumulative distribution function of 
the regression error in a heteroscedastic model with univariate covariates. The estimator they proposed is 
based on nonparametrically estimated regression residuals. The weak convergence of their estimator was 
proved. The results obtained by Akritas and Van Keilegom (2001) have been generalized by Neumeycr 
and Van Keilegom (2010) to the case of multivariate covariates. Miiller, Schick and Wefelmeyer (2004) 
investigated linear functionals of the error distribution in nonparametric regression. Cheng (2005) established 
the asymptotic normality of an estimator of the error density based on estimated residuals. The estimator 
he proposed is constructed by splitting the sample into two parts: the first part is used for the estimation of 
the residuals, while the second part of the sample is used for the construction of the error density estimator. 
Efromovich (2005) proposed an adaptive estimator of the error density, based on a density estimator proposed 
by Pinsker (1980). Finally, Samb (2010) also considered the estimation of the error density, but his approach 
is more closely related to the one in Akritas and Van Keilegom (2001). 

In order to achieve the objective of this paper, namely the estimation of the error density under model 
(jl.ip . we first need to estimate the transformation parameter Oq. To this end, we make use of the results in 
Linton, Sperlich and Van Keilegom (2008). In the latter paper, the authors first discuss the nonparametric 
identification of model (|l.ip . and second, estimate the transformation parameter 6o under the considered 
model. For the estimation of this parameter, they propose two approaches. The first approach uses a 
semiparametric profile likelihood (PL) estimator, while the second is based on a 'mean squared distance 
from independence-estimator (MD) using the estimated distributions of X, e and (AT, e). Linton, Sperlich 
and Van Keilegom (2008) derived the asymptotic distributions of their estimators under certain regularity 
conditions, and proved that both estimators of do are root-n consistent. The authors also showed that, in 
practice, the performance of the PL method is better than that of the MD approach. For this reason, the 
PL method will be considered in this paper for the estimation of Oq. 

The rest of the paper is organized as follows. Section 2 presents our estimator of the error density and 
groups some notations and technical assumptions. Section 3 describes the asymptotic results of the paper. 
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A simulation study is given in Section 4, while Section 5 is devoted to some general conclusions. Finally, the 
proofs of the asymptotic results are collected in Section 6. 



2 Definitions and assumptions 
2.1 Construction of the estimators 

The approach proposed here for the estimation of is based on a two-steps procedure. In a first step, we 
estimate the finite dimensional parameter do- This parameter is estimated by the profile likelihood (PL) 
method, developed in Linton, Sperlich and Van Keilegom (2008). The basic idea of this method is to replace 
all unknown expressions in the likelihood function by their nonparametric kernel estimates. Under model 
(jl.ip . we have 

^{Y< y\X) = P(Ae„(y) < KgSy)\X) = P(ee„ < ^eAv) ' ™e„(A)|X) = F, [KeAv) " ™e„(^)) - 
Here, FAt) = P(£ < t), and so 

fY\x{y\x) = fe i^eaiy) - fneA^)) KAy)^ 
where and fY\x are the densities of £, and of Y given X, respectively. Then, the log likelihood function is 

n 

J2 {log A«(Ae(y.) - me{X,)) + log A',(y,)} , OeS, 

i=l 

where f^g is the density function of ee. Now, let 

me{x) = ^ > (2.1) 

(^) 

be the Nadaraya- Watson estimator of me (a;), and let 

/.W = -Ea^.(^^^). (2.2) 
"^9 1^1 V 9 ) 

where £i(^) = t^e^XA ~ fhe(Xi). Here, K\ and K2 are kernel functions and h and g are bandwidth sequences. 
Then, the PL estimator of do is defined by 

n 

e = argmax^ [log (Ae(y,) - me(X,)) + log A^(r,)] . (2.3) 

i—l 

Recall that me{XA converges to mg{Xi) at a slower rate for those Xi which are close to the boundary of 
the support X of the covariate X. That is why we assume implicitly that the proposed estimator (|2.3p of 60 
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trims the observations Xi outside a subset of X . Note that we keep the root-n consistency of 9 proved 
in Linton, Sperlich and Van Keilegom (2008) by trimming the covariatcs outside A'q. But in this case, the 
resulting asymptotic variance is different to the one obtained in the latter paper. 

In a second step, we use the above estimator 6 to build the estimated residuals £i{d) = ^giXi) — iri^{Xi). 
Then, our proposed estimator (t) of /e (t) is defined by 



where Kt, is a kernel function and 6 is a bandwidth sequence, not necessarily the same as the kernel K2 and 
the bandwidth g used in (|2.2p . Observe that this estimator is a feasible estimator in the sense that it does 
not depend on any unknown quantity, as is desirable in practice. This contrasts with the unfeasible ideal 
kernel estimator 



which depends in particular on the unknown regression errors ~ £i{0o) ~ ^9o(Xi) — m(Xi). It is however 
intuitively clear that (t) and (t) will be very close for n large enough, as will be illustrated by the results 
given in Section 3. 

2.2 Notations 

When there is no ambiguity, we use e and m to indicate ee„ and mg^. Moreover, M{0o) represents a 
neighborhood of Oq- For the kernel Kj (j = 1,2,3), let ii(Kj) = J v'^Kj{v)dv and let Kj^^ be the pth 
derivative of Kj. For any function ipe{y), denote i^e(y) = d(pe{y)/d9 = {d(pe{y)/d9i, . . . ,d(pe{y)/d9pY and 
(Pg{y) = d(pg{y)/dy. Also, let \\A\\ ~ {J^ A)^!"^ be the Euclidean norm of any vector A. 

For any functions m, r, /, and 9, and any 6* G 9, let s = {fh,r, f,(p,q), sg = {me,m,g, f^^, f^^, f^^), 
£i{9,fh) — h^giXi) — fh{Xi), and define 




(2.4) 




(2.5) 



Gn{9,s) 



1 



E 



n 



( 



f{e^{9,m)) 



1 



V{e,{9, m)}{Kg{Y,) - r[X,)} + q{e,{9, m)} + 



G{9, s) = E[Gn{9, s)] and g{9o, sgj = ^G{9, se)l 



■S, 



2.3 Technical assumptions 



The assumptions we need for the asymptotic results are listed below for convenient reference. 



5 



(Al) The function Kj (j = 1, 2, 3) is symmetric, has compact support, J v^Kj{v)dv = for k = 1, . . . , (/j — 1 
and J v''^ Kj{v)dv ^ for some qj > 4, Kj is twice continuously differentiable, and jK^^\v)dv = 0. 

(A2) The bandwidth sequences h, g and b satisfy nh'^'^^ = ng^'^ = o(l) (where qi and (72 are defined 



in (Al)), (nfo5)-i 0(l), n63/i2(iog/j-i)-2 and ng^{\ogg~^) 



1\~2 



00. 



(A3) (i) The support X of the covariate X is a compact subset of M, and Xq is a subset with non empty 
interior, whose closure is in the interior of X. 

(ii) The density fx is bounded away from zero and infinity on X, and has continuous second order partial 
derivatives on X. 

(A4) The function ■mg{x) is twice continuously differentiable with respect to o\\ X x Af{9o), and the 
functions mg{x) and riieix) are qi times continuously differentiable with respect to x on A" x Af{9o). All 
these derivatives are bounded, uniformly in {x,9) e A" x N{9o)- 

(A5) The error e = Kg^iY) — m{X) has finite fourth moment and is independent of X. 

(A6) The distribution F^g {t) is (73 + 1 (respectively three) times continuously differentiable with respect to 
t (respectively 9), and 



sup 



Qk+e 



dt^d9{' . . . d9p' 
for all k and £ such that Q<k + i<2, where £ ^ £1 + . . 



< 00 



£p and 9 



(A7) The transformation Agly) is three times continuously differentiable with respect to both 9 and y, and 
there exists a a > such that 

gk+i 



E 



sup 

9':\\e'-e\\<a 



dy''d9{' ...del 



-K9,{Y) 



< 00 



and 9 



for all 9 G &, and for all k and £ such that < k + £ < 3, where £ = £1 
Moreover, sup^.^^^. E[A^ {Y)\X = x] < 00. 

(A8) For all f? > 0, there exists e{ri) > such that 



inf \\G{9,se)\\>e{ii)>0. 
\\o-ej>ri 

Moreover, the matrix G{9o,sg^) is non-singular. 

(A9) (i) E{Ag^{Y)) = 1, Ag^{0) = and the set {x € Xq : m'{x) ^ 0} has nonempty interior. 



■ 7 '^v) 
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(ii) Assume that (f){x,t) — kg^[Kg ^{m{x) + t))f^{t) is continuously diffcrcntiable with respect to t for all x 



and that 



sup E —{X,s) < oo. 



(2.6) 



s:|t-s|<(5 



for all t G M and for some (5 > 0. 

Assumptions (Al), part of (A2), (A3)(ii), (A4) and (A6), (A7) and (A8) are used by Linton, Spcrhch 
and Van Keilegom (2008) to show that the PL estimator 6 of 6o is root ?i-consistent. The differentiability of 
Kj up to second order imposed in assumption (Al) is used to expand the two-steps kernel estimator /e(t) 
in (|2.4p around the unfeasible one fe{t). Assumptions (A3)(ii) and (A4) impose that all the functions to 
be estimated have bounded derivatives. The last assumption in (A2) is useful for obtaining the uniform 
convergence of the Nadaraya- Watson estimator of mg^ in (|2.ip (see for instance Einmahl and Mason, 2005). 
This assumption is also needed in the study of the difference between the feasible estimator /e(i) and 
the unfeasible estimator fe{t). Finally, (A9)(i) is needed for identifying the model (see Vanhems and Van 
Keilegom (2011)). 

3 Asymptotic results 

In this section we are interested in the asymptotic behavior of the estimator f^it). To this end, we first 
investigate its asymptotic representation, which will be needed to show its asymptotic normality. 

Theorem 3.1. Assume (Al)-(A9). Then, 



where Rn{t) = op ((nfe)"^/^) for all t G K. 

This result is important, since it shows that, provided the bias term is negligible, the estimation of Oq 
and to(-) has asymptotically no effect on the behavior of the estimator /e(i). Therefore, this estimator is 
asymptotically equivalent to the unfeasible estimator fe{t), based on the unknown true errors ei, . . . ,e„. 

Our next result gives the asymptotic normality of the estimator fe{t)- 

Theorem 3.2. Assume (Al)-(A9). In addition, assume that nb'^'^^^^ = 0{1). Then, 
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where 

7s W = fe{t) + —Ji'''\t) I v'^^K,iv)dv. 

J 

The proofs of Theorems 13.11 and 13.21 are given in Section |6l 

4 Simulations 

In this section, we investigate the performance of our method for different models and different sample sizes. 
Consider 

Ae„(r) = 60 + biX^ + &2 sin(7rX) + CTeS, (4.1) 
where Ag is the Manly (1976) transformation 



Ae(2/) = 




61 7^ 0, 
9^0, 



9o e [—0.5, 1.5], X is uniformly distributed on the interval [—0.5, 0.5], and e is independent of X and has a 
standard normal distribution but restricted to the interval [—3,3]. We study three different model settings. 
For each of them, &o = 3(7,; + ^2- The other parameters are chosen as follows: 

Model 1: bi = 5, &2 = 2, = 1.5; 

Model 2: 61 = 3.5, &2 = 1-5, cj^ = 1; 
Model 3: 61 = 2.5, &2 = 1, 0.5. 

The parameters and the error distribution have been chosen in such a way that the transformation 
Ae„ {Y) is positive, to avoid problems when generating the variable Y . Our simulations are done for 9o = 0, 
0.5 or 1. The estimator of 9o is chosen from a grid on the interval [—0.5, 1.5] with step size 0.0625. We used 
the kernel K{x) = if (l - a;^) 1 {\x\ < 1) for both the regression function and the density estimators. The 
results are based on 100 random samples of size n = 50 or n = 100, and we worked with the bandwidths 
h ~ 0.3 X -nT^I^ and h = g ~ r„, where r„ = 1.06 x std(£) x rT^I^ ^ which is Silverman's (1986) rule of thumb 
bandwidth for univariate density estimation. Here std(£) is the average of the standard deviations of e over 
the 100 samples. 

Table [T] shows the values of the mean, standard deviation and mean squared error of 9 for the considered 
models, sample sizes and values of 9o- We observe that the results for the different models are quite similar, 
and as expected, the results are better for n = 100 than for n = 50. 
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Table shows the mean squared error (MSE) of the estimator /^(i) of the standardized (pseudo- 
estimated) error e = (Ag(F) — fhg{X)) /(t^, for sample sizes n = 50 and n = 100 and for t = —1. and 
1. Results for /^(t) have also been obtained, but are not reported here. Indeed, Figure [TJ displaying /^(t), 
shows that, even though residuals are standardized for each simulation (with known Ce), better behavior 
is observed for models with smaller a,,- Moreover, we observe that for 9^ = there is very little difference 
between the curve of /g- and the one of the standard normal density. On the other hand for 6o ~ 0.5 and 
9o = 1, wc notice an important difference between the two curves under Model 1 and 2, but the difference is 
less important under Model 3. 



n 


Oo 


nican(0) 


std(^) 


MSE(^) 






Model 1 


Model 2 


Model 3 


Model 1 


Model 2 


Model 3 


Model 1 


Model 2 


Model 3 


50 





0.0063 


0.0065 


0.0071 


0.0116 


0.0161 


0.0239 


0.0064 


0.0124 


0.0277 




0.5 


0.3787 


0.3754 


0.3907 


0.0417 


0.0438 


0.0486 


0.0783 


0.0867 


0.1140 




1 


0.8197 


0.8449 


0.8658 


0.0792 


0.0796 


0.0798 


0.3506 


0.3492 


0.3409 


100 





0.0055 


0.0148 


0.0170 


0.0057 


0.0078 


0.0116 


0.0032 


0.0059 


0.0132 




0.5 


0.4596 


0.4621 


0.4728 


0.0246 


0.254 


0.0270 


0.0634 


0.0676 


0.0752 




1 


0.9196 


0.9545 


0.9749 


0.0401 


0.0437 


0.0438 


0.2092 


0.1999 


0.1637 



Table 1: Approximation of the mean, the standard deviation and the mean squared error of 9 for the three 
regression models. All numbers are calculated based on 100 random samples. 

5 Conclusions 

In this paper we have studied the estimation of the density of the error in a semiparametric transformation 
model. The regression function in this model is unspecified (except for some smoothness assumptions), 
whereas the transformation (of the dependent variable in the model) is supposed to belong to a parametric 
family of monotone transformations. The proposed estimator is a kernel-type estimator, and we have shown 
its asymptotic normality. The finite sample performance of the estimator is illustrated by means of a 
simulation study. 

It would be interesting to explore various possible applications of the results in this paper. For example, 
one could use the results on the estimation of the error density to test hypotheses concerning e.g. the normality 
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of the errors, the homoscedasticity of the error variance, or the hnearity of the regression function, all of 
which arc important features in the context of transformation models. 

6 Proofs 

Proof of Theorem 13.11 Write 

Ut) - Mt) - iUt) - fe{t)] + m) - fe{t)l 

where 

and e^i ~ Ag^ (Yi) — fho^ i^i), i = 1, . . . , In a completely similar way as was done for Lemma A.l in Linton, 
Sperlich and Van Keilegom (2008), it can be shown that 

m m = 4 E (^) - m + orHnb)-'/') (e.i) 

4=1 ^ ^ 

for all t G M. Note that the remainder term in Lemma A.l in the above paper equals a sum of i.i.d. terms of 
mean zero, plus a op(7i^^/^) term. Hence, the remainder term in that paper is Op(n~^/^), whereas we write 
Op ( (rife) ^^/^) in (|6.ip . Therefore, the result of the theorem follows if we prove that f^{t) — f^{t) ~ op((ri&)~^/^). 
To this end, write 

%(t) - m 

i=i ^ ' 

for some /3 € (0, 1). In what follows, we will show that each of the terms above is 0p((n6)^^/^). First consider 
the last term of (|6.2p . Since Ag{y) and Thg{x) are both twice continuously differentiable with respect to 9, 
the second order Taylor expansion gives, for some Oi between 6o and 9 (to simplify the notations, we assume 
here that p = dim(0) — 1), 

= Ag(y,) - Ae„(KO - {mg{X,) - meAX^)) 

= [9- 0o){AeAY^) - meAX^)) + ^{0 - 0o)2(A,,(K,) - fhe,{Xi)). 

10 



Therefore, since 9 — 9o = op((n5)^^/^) by Theorem 4.1 in Linton, Sperhch and Van Keilegom (2008) (as 
before, we work with a slower rate than what is shown in the latter paper, since this leads to weaker conditions 
on the bandwidths), assumptions (A2) and (A7) imply that 



1 " 



(2) (edeo)+PiUS)~e,{0o))-t 



Op ((n63)-i) , 



i=i 



which is op{{nb) ^^^), since (nb^) ^ ~ 0(1) under (A2). For the first term of (|6.2p . the decomposition of 
£j;(6') — £i(6'o) given above yields 



1 " 

10^ 



(1) / £t(^o) — ^ 



(g - gp) 
nb'^ 

{O-Oo) 
nb"^ 



n 

i=i 



(1) / £i{6o) - t 



(1) I £i -t 



+ op((n6)-i/2) 
op((726)-i/2), 



(6.2) 



where the last equality follows from a Taylor expansion applied to K^^\ the fact that 

mg^x) - mg^x) = Op((n;i)-i/2(iog/i-i)i/2)^ 

uniformly in x ^ Xq by Lemma [^?T1 and the fact that nhb^ {log h^^)^^ oo under (A2). Further, write 



EE 

i=l 



^E[me„(X,)]E 



We will only show that the first term above is 0{nb^) for any t e M. The proof for the other term is similar. 
Let (p{x, t) ~ Ag^(A^"'^(m(x) + t)) and set <j){x^ t) = '■p{x^ t)f^{t). Then, applying a Taylor expansion to 0(a;, •), 
it follows that (for some /3 G (0, 1)) 

A„ = ^ E Ae„ (A,-i(m(X,) + eO) K^^^ ^ ' 



n / / 0(a;,e)i^^ 



— nb 
= nb 
= n6^ 



(1) 



X, t + bv)K^^'' iv)fx {x)dxdv 



fx(x)dxde 



{x,t) +bv^{x,t + Pbv) 
ot 



K^/\v)fxix)dxdv 



v-^{x,t + 0bv)Ki''> {v)fx {x)dxdv, 
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since J K^^\v)dv = 0, and this is bounded by Knb"^ supg.^^_g^^g'E\^{X^ s)\ = 0{nb'^) by assumption 
(A9)(ii). Hence, Tchcbychcv's inequality ensures that 



nb 



since nb^^^ — oo by (A2). Substituting this in (|6.2p . yields 

^,±md)-e.m4^' ^^^^ 



for any t G R. This completes the proof. 



op((n6)-i/2) 



Proof of Theorem 13.21 It follows from Theorem 13.11 that 

hit) - fe{t) - [Je{t) - E/at)] + WS) - fe{t)] + Or{{nb)-^'^). 



(6.3) 



The first term on the right hand side of (|6.3p is treated by Lyapounov's Central Limit Theorem (LCT) for 
triangular arrays (see e.g. Billingsley 1968, Theorem 7.3). To this end, let 

Then, under (Al), (A2) and (A5) it can be easily shown that 



Er=iE /.„(i)-E/„,(<) 



(Er=iVar/„(i) 



3/2 



< 



Cnb-^Mt) I \K3iv)fdv + o{nb-^) 

Tj2 



nb~^feit) J Kl{v)dv + o {nb-^) 
for some C > 0. Hence, the LCT ensures that 

/e(i)"E/,(t) Ut)-^Je{t) d 



0{{nb) 



-1/2N 



Var/,(0 



Var/i„(f) 



TV (0,1). 



b ifS) - Afe{t)] 4 N OJ,{t) / Kl{v)d 



This gives 



For the second term of (|6.3p . straightforward calculations show that 



(6.4) 



EA(t) - Mt) = ^/i*^)(t) / v''^K3{v)dv + o{b'^') 



93! 



Combining this with (|6.4p and (|6.3p . we obtain the desired result. 
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Lemma 6.1. Assume (A1)-(A5) and (A7). Then, 



sup \m0^{x) — ni0^{x)\ 

x£Xo 

sup \fhg^{x) — rng^{x)\ 



Op((n/i)-i/2(log/i-i)V2). 



Proof. We will only show the proof for m0^{x) — me^(x), the proof for fhg^ix) — mg^{x) being very similar. 
Let c„ = (n/i) ^ (log /i- 1)1/2, and define 



^9^ 



n 



-X 



fg^ix) = E[?g^{x)], fxix) = E[fx{x)], 



where fxix) = (nh)-^ j:'' K^i^). Then, 



sup \mg^{x) — riig^{x)\ < sup 

xGXo xEXa 



mg^(x) - =■ 



fxix) 



1 



+ sup 



ke„(a;) - fx{x)me^{x)\ 



(6.5) 



Since E[Ag (i^)|A" = .t] < oo uniformly in a; G A' by assumption (A7), a similar proof as was given for 
Theorem 2 in Einmahl and Mason (2005) ensures that 

rg^ (x) 



sup 

xeXo 



fhg^x) 



fxix) 



Op (c„) . 



Consider now the second term of (|6.5p . Since E[e(6'o)|X] = 0, where e(0o) = wi^^i^) " iT^e{X))\g=g^, we 



have 



X -X 



X -X 



mgAX)Kj 



rrig^{x + hv)Ki{v)fxix + hv)dv, 

from which it follows that 

ix) - fx ix)mg^ [x) = j [riie^ [x + hv) - mg^ {x)] Ki (w)/x (a; + hv)dv. 

Hence, a Taylor expansion applied to rngA-) yields 

sup \rg^ {x) - fx ix)me^ ix) I = Oih''^ ) = O (c„) , 
xeXo 

since n/i2'?i+i(log/i^i)^i = 0(1) by (A2). This proves that the second term of (|6.5p is 0(c„), since it can 
be easily shown that fxix) is bounded away from and infinity, uniformly in x & Xq, using (A3)(ii). □ 
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n 


Oo 


t 


Mean Squared Error of 








Model 1 


Model 2 


Model 3 


50 





-1 


0.0026 


0.0025 


0.0019 









0.0040 


0.0037 


0.0028 






1 


0.0026 


0.0023 


0.0017 




0.5 


-1 


0.0063 


0.0048 


0.0025 









0.0527 


0.0372 


0.0147 






1 


0.0062 


0.0046 


0.0020 




1 


-1 


0.0078 


0.0048 


0.0024 









0.0564 


0.0314 


0.0133 






1 


0.0049 


0.0030 


0.0019 


100 





-1 


0.0012 


0.0011 


0.0008 









0.0039 


0.0035 


0.0026 






1 


0.0017 


0.0015 


0.0012 




0.5 


-1 


0.0015 


0.0014 


0.0011 









0.0075 


0.0057 


0.0031 






1 


0.0021 


0.0018 


0.0012 




1 


-1 


0.0024 


0.0018 


0.0012 









0.0110 


0.0052 


0.0270 






1 


0.0019 


0.0016 


0.0012 



Table 2: Mean squared error of fi{t) for three regression models. All numbers are calculated based on 100 
random samples. 
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